home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
cocktail
/
doc.lha
/
doc.doc
/
trafo.doc
< prev
next >
Wrap
Text File
|
1992-09-25
|
59KB
|
1,469 lines
___________________________________________________________________
Transformation
of Attributed Trees
Using Pattern Matching
J. Grosch
___________________________________________________________________
___________________________________________________________________
GESELLSCHAFT FUeR MATHEMATIK
UND DATENVERARBEITUNG MBH
FORSCHUNGSSTELLE FUeR
PROGRAMMSTRUKTUREN
AN DER UNIVERSITAeT KARLSRUHE
___________________________________________________________________
Project
Compiler Generation
___________________________________________________________
Transformation of Attributed Trees Using Pattern Matching
Josef Grosch
Aug. 28, 1991
___________________________________________________________
Report No. 27
Copyright c 1991 GMD
Gesellschaft fuer Mathematik und Datenverarbeitung mbH
Forschungsstelle an der Universitaet Karlsruhe
Vincenz-Priesznitz-Str. 1
D-7500 Karlsruhe
1
Transformation of Attributed Trees Using Pattern Matching
Josef Grosch
GMD Forschungsstelle an der Universitaet Karlsruhe
Vincenz-Priesznitz-Str. 1, D-7500 Karlsruhe, Germany
+721-662226
grosch@karlsruhe.gmd.de
Abstract This paper describes a tool for the transformation of attributed
trees using pattern matching. The trees to be processed are defined by a for-
malism based on context-free grammars. Operations for trees such as composi-
tion and decomposition are provided. The approach can be characterized as an
amalgamation of trees or terms including pattern matching, with recursion,
attribute grammars, and imperative programming. Transformations can either
modify the input trees or map them to arbitrary output. Possible applications
are the various transformation tasks in compilers such as semantic analysis,
optimization, or the generation of intermediate representations. The design
goals have been to combine an expressive and high level technique for
transformation with flexibility, efficiency, and practical usability. A reli-
able development style is supported by static typing and checks for the single
assignment property of variables. We give some example transformations and
describe the input language of our tool called puma. The relationship to
similar work is discussed. Finally, experimental results are presented that
demonstrate the efficiency of our approach.
Keywords transformation, attributed trees, pattern matching
1. Introduction
The transformation of trees using pattern matching becomes an accepted tech-
nique. Several tools have been constructed recently that follow this princi-
ple [CoP90, HeS91, LMW89, Vol91]. Tools for code generation successfully use
the same technique, too [AGT89, ESL89]. We present a new tool called puma and
its input language for the transformation and manipulation of attributed trees
and graphs [Gro91a]. Puma stands for pattern matching and unification. Its
intended application areas are the various transformation tasks in a compiler
operating on abstract syntax trees or arbitrary graph structures. This
includes semantic analysis, optimization, intermediate code generation,
source-to-source translation, and eventually machine code generation.
The trees that are subject to pattern matching are described by a formal-
ism based on context-free grammars. The tree nodes may be associated with
attributes of arbitrary types. Node types are used to specify the properties
of tree nodes. An extension mechanism induces a subtype relation among the
node types. Pattern matching is extended to handle subtypes and attributes,
too. Operations for the composition and decomposition of trees are supported
by a concise notation.
The building blocks for a transformation are recursive subroutines, clas-
sified as procedures, functions, and predicates, with an arbitrary number of
2
input and output parameters. The bodies of the subroutines consist of rules
which are made up of patterns, conditions, statements, and expressions. The
first two components control the applicability of a rule. The statements
determine what has to be done whenever a rule is applicable. The expressions
provide values for the output parameters and the function result.
Static type checking with respect to trees is provided. Variables are
declared implicitly and they are checked for the single assignment property.
There is read and write access to attributes stored in the tree which allows
the construction of attribute evaluators. In all places it is possible to
escape to hand-written code which provides the power and flexibility of the
imperative programming style.
The output of the generator is a source module in one of the target
languages C or Modula-2. This module allows for easy integration and coopera-
tion with other modules, either hand-written or generated ones. The pattern
matching is local and considers a region at the top of the current subtree,
only. It is implemented by direct code and therefore efficient.
Our approach can be regarded from several points of view: From the point
of view of imperative programming it is an extension by statically typed,
attributed trees, constructs for composition and decomposition, and pattern
matching. From the point of view of logic programming it omits backtracking
and restricts pattern matching to one of the two terms being a ground term.
It adds attributes which are stored in the terms (trees), static typing, input
and output modes for parameters, and an easy escape to imperative features.
From the point of view of functional programming it offers the simple style of
functional programming which has always been present in imperative languages
having functions and recursion. It adds the pattern matching facility. From
the point of view of attribute grammars it allows the specification of attri-
bute evaluation with explicit control of the evaluation order or visit
sequences. This eases the use of global attributes and gives full control on
side-effects.
The intended use of this tool proceeds in three steps: First, a tree is
constructed either by a parser, a previous transformation phase, or whatever
is appropriate. Second, the attributes in the tree are evaluated either using
an attribute grammar based tool, by a puma specified tree traversal and attri-
bute computations, or by hand-written code. Third, the attributed tree is
transformed or mapped to another data structure by a puma generated transfor-
mation module. These steps can be executed one after the other or more or
less simultaneously. Besides trees, puma can handle attributed graphs as
well, even cyclic ones. Of course the cycles have to be detected in order to
avoid infinite loops. A possible solution uses attributes as marks for nodes
already visited.
A transformer module can make use of attributes in the following ways: If
attribute values have been computed by a preceding attribute evaluator and are
accessed in read only mode then this corresponds to the three step model
explained above. A puma generated module can also evaluate attributes on its
own. A further possibility is that an attribute evaluator can call puma sub-
routines in order to compute attributes. This is especially of interest when
attributes depend on tree-valued arguments.
3
The tool supports two classes of tree transformations: mappings and
modifications. Tree mappings map an input tree to arbitrary output data. The
input tree is accessed in read only mode and left unchanged. Tree modifica-
tions change a tree by e. g. computing and storing attributes at tree nodes or
by changing the tree structure. In this case the tree data structure serves as
input as well as output and it is accessed in read and write mode.
The first class covers applications like the generation of intermediate
languages or machine code. Trees are mapped to arbitrary output like source
code, assembly code, binary machine code, linearized intermediate languages
like P-Code, or another tree structure. A further variant of mapping is to
emit a sequence of procedure calls which are handled by an abstract data type.
The second class covers applications like semantic analysis or optimiza-
tion. Trees are decorated with attribute values, properties of the trees
corresponding to context conditions are checked, or trees are changed in order
to reflect optimizing transformations.
Puma is part of the Karlsruhe Toolbox for Compiler Construction [GrE90].
In particular it cooperates with the generator for abstract syntax trees ast
[Gro91b] and the attribute evaluator generator ag [Gro89]. The attributed
trees are defined and managed by a module generated with ast. A second module
generated by puma creates and handles these trees. This way all the powerful
operations for trees and graphs provided by ast are available such as reader
and writer procedures or the interactive browser. For sake of simplicity we
will deviate from reality in this paper and treat the definition of the tree
structure as part of puma.
The rest of this paper is organized as follows: Section 2 presents a few
simple examples of how to describe transformations with puma. Section 3
describes the input language of the tool. Section 4 sketches the implementa-
tion of the generated transformer module. Section 5 compares our approach
with related work. Section 6 presents experimental results. Section 7 con-
tains concluding remarks.
2. Tree Transformation by Pattern Matching
The probably easiest way to get an impression of our approach can be obtained
by having a look at a few introductory examples. We will use the abstract syn-
tax of simple arithmetic expressions as input data structure. Besides a few
intrinsic attributes describing e. g. the values of constants we use an attri-
bute called Type. It describes the type of every subexpression. Its domain
are trees, too. The tree definition based on a context-free grammar shown in
Example 1 specifies the structure of expressions and types.
4
Example 1: Tree Definition
Expr = Type <
Plus = Lop: Expr Rop: Expr .
Minus = Lop: Expr Rop: Expr .
Const = [Value] .
Adr = <
Index = Adr Expr .
Select = Adr [Ident: tIdent] .
Ident = [Ident: tIdent] .
>.
>.
Type = <
Int = .
Real = .
Bool = .
Array = [Lwb] [Upb] Type .
Record = Fields .
>.
Fields = <
NoField = .
Field = [Ident: tIdent] Type Fields .
>.
The names before the character '=' can be regarded both as rule names or
nonterminals. The possible right-hand sides for one nonterminal are enclosed
in angle brackets '<' and '>'. Non-tree valued attributes are enclosed in
square brackets '[' and ']'. The attributes Lwb, Upb, and Value are of the
default type int. The attribute Ident is of the user-defined type tIdent.
The tree-valued attribute Type of the rule named Expr is written like every
other right-hand side nonterminal. The selector names Lop and Rop used in the
rules called Plus and Minus allow symbolic access to right-hand side elements
having the same type.
The association of an attribute such as Type with a nonterminal like Expr
adds this attribute to every right-hand side belonging to this nonterminal.
Hence the rules Plus and Minus have three right-hand side elements.
The subroutine specification presented in Example 2 describes part of a
transformation of expression trees to P-Code [NAJ76]. The procedure P_Code
performs a recursive tree traversal. Its body consists of a sequence of rules.
Every rule is made up of a pattern and a list of statements separated by ':-'.
Whenever a pattern matches against the input parameter of the subroutine, the
Example 2: Generation of P-Code
PROCEDURE P_Code (Tree)
Plus (Int (), Lop, Rop) :- P_Code (Lop); P_Code (Rop); Emit (ADDI); .
Plus (Real (), Lop, Rop) :- P_Code (Lop); P_Code (Rop); Emit (ADDR); .
Minus (Int (), Lop, Rop) :- P_Code (Lop); P_Code (Rop); Emit (SUBI); .
Minus (Real (), Lop, Rop) :- P_Code (Lop); P_Code (Rop); Emit (SUBR); .
5
associated statement list is executed. The parameter of the procedure P_Code
is of type Tree which means that every tree according to the tree definition
is a legal argument. The procedure Emit is supposed to be an external subrou-
tine that performs output.
Example 3: Computation of Type Sizes
FUNCTION TypeSize ([Type, Fields]) int
Int () RETURN 4 .
Real () RETURN 4 .
Bool () RETURN 1 .
Array (Lwb, Upb, T) RETURN (Upb - Lwb + 1) * TypeSize (T) .
Record (F) RETURN TypeSize (F) .
Field (_, T, F) RETURN TypeSize (T) + TypeSize (F) .
NoField () RETURN 0 .
The function TypeSize in Example 3 transforms or maps trees to integer
values. It is defined to operate on trees of types Type and Fields and it
returns a value of a type named int. Again this subroutine performs a recur-
sive tree traversal. Instead of producing a side-effect like in the previous
example it represents a pure functional mapping and demonstrates the arith-
metic expression capabilities of puma. The character '_' denotes a so-called
don't care pattern.
Example 4: Testing two Types for Compatibility
PREDICATE IsCompatible ([Type, Fields], [Type, Fields])
Int () , Int () .
Real () , Real () .
Bool () , Bool () .
Array (Lwb, Upb, T1), Array (Lwb, Upb, T2) :- IsCompatible (T1, T2); .
Record (F1) , Record (F2) :- IsCompatible (F1, F2); .
Field (_, T1, F1) , Field (_, T2, F2) :- IsCompatible (T1, T2);
IsCompatible (F1, F2); .
Example 4 presents the predicate IsCompatible which can be seen as a
boolean function. It operates on two parameters of types Type or Fields.
Accordingly, every rule has two patterns for matching against the two argu-
ments. The first three rules lead to a return value of true if both patterns
match both arguments. The fourth rule returns true if the attributes Lwb and
Upb of the two argument trees match (have the same value) and if the recursive
call IsCompatible returns true for the two types T1 and T2 of the array ele-
ments. If none of the rules matches the predicate returns false.
The subroutine given in Example 5 has three input and one output parame-
ter. It computes the result type of a binary expression which depends on the
types of the operands and on the operator. The third pattern of every rule is
enclosed in curly brackets '{ and '}'. This represents so-called target code
which is more or less passed unchanged and unchecked to the generated module.
In this case opPlus and opMinus are named constants encoding operators.
Without the curly brackets they would be treated as pattern variables. The
6
Example 5: Procedure with Input and Output Parameters
PROCEDURE ResultType (Type, Type, Operator: int => Type)
Int () , Int () , { opPlus } => Int () .
Real () , Real () , { opPlus } => Real () .
Int () , Int () , { opMinus } => Int () .
Real () , Real () , { opMinus } => Real () .
expression after the symbol '=>' describes the value of the output parameter
in case of a successful match. It consists of a tree constructor which creates
a tree having one node.
3. Specification Language
The description of a tree transformation consists of the definition of attri-
buted trees and a set of subroutines.
3.1. Tree Structure
The structure of attributed trees or (directed) graphs is specified by a for-
malism based on context-free grammars. However, we primarily use the terminol-
ogy of trees and types, here. A tree consists of nodes. A node may be related
to other nodes in a so-called parent-child relation. Then the first node is
called a parent node and the latter nodes are called child nodes. Nodes
without a parent node are usually called root nodes, nodes without children
are called leaf nodes.
The structure and the properties of nodes are described by node types.
Every node belongs to a node type. A specification of a tree describes a fin-
ite number of node types. A node type specifies the names of the child nodes
and the associated node types as well as the names of the attributes and the
associated attribute types.
Children are distinguished by selector names which have to be unambiguous
within one node type. The children are of a certain node type. Example:
Plus = Lop: Expr Rop: Expr .
Index = Adr: Adr Expr: Expr .
The example introduces two node types called Plus and Index. A node of type
Plus has two children which are selected by the names Lop and Rop. Both chil-
dren are of the node type Expr. If a selector name is equal to the associated
name of the node type it can be omitted. Therefore, the node type Index can be
abbreviated as follows:
Index = Adr Expr .
As well as children, every node type can specify an arbitrary number of
attributes of arbitrary types. Like children, attributes are characterized by
a selector name and a certain type. The descriptions of attributes are
enclosed in brackets '[' and ']'. The attribute types are given by names taken
7
from the target language. Missing attribute types are assumed to be int or
INTEGER depending on the target language (C or Modula-2). Children and attri-
butes can be given in any order (see Example 1).
To allow several alternatives for the types of children an extension
mechanism is used. A node type may be associated with several other node types
enclosed in angle brackets '<' and '>'. Then this node type is called base or
super type and the associated types are called derived types or subtypes. A
derived type can in turn be extended with no limitation of the nesting depth.
The extension mechanism induces a subtype relation between node types denoted
by the symbol . This relation is transitive. Where a node of a certain node
type is required, either a node of this node type or a node of a subtype
thereof is legal.
In Example 1 Expr is a base type describing nodes with one child called
Type. The node type Expr has four derived types called Plus, Minus, Const,
and Adr. The node type Adr is in turn extended by three derived types called
Index, Select, and Ident. Where a node of type Expr is required, all men-
tioned node types are legal. Where a node of type Adr is required, nodes of
the types Index, Select, or Ident are legal. Where a node of type Index is
required, nodes of type Index are legal, only. The subtype relation is the
transitive and reflexive closure of: Plus Expr, Minus Expr, Const Expr, Adr
Expr, Index Adr, Select Adr, Ident Adr.
Besides extending the set of legal node types, the extension mechanism
has the property of extending the children and attributes of the base type.
The derived types possess the children and attributes of the base type. They
may define additional children and attributes. In other words they inherit the
structure of the base type. The selector names of all children and attributes
in an extension hierarchy have to be distinct. The syntax has been designed
this way in order to allow single inheritance, only.
In Example 1 nodes of type Expr have one child selected by the name Type.
Nodes of type Plus have three children with the selector names Type, Lop, and
Rop.
3.2. Subroutines
A set of subroutines constitutes the main building blocks of a transformation.
Like in programming languages, subroutines are parameterized abstractions of
statements or expressions. There are three kinds of subroutines:
procedure: a subroutine acting as a statement
function: a subroutine acting as an expression and returning a value
predicate: a boolean function
Subroutines are specified according to the following syntax:
8
Subroutine = Header { Rule }
Header = PROCEDURE Ident ( [ Parameters ] [ => Parameters ] )
| FUNCTION Ident ( [ Parameters ] [ => Parameters ] ) Type
| PREDICATE Ident ( [ Parameters ] [ => Parameters ] )
Parameters = [ Ident : ] Type { , [ Ident : ] Type }
A subroutine consists of a header and a sequence of rules. The header speci-
fies the kind of the subroutine, its name, and its parameters. In case of a
function, the type of the result value is added. Input and output parameters
are separated by the symbol '=>'. It suffices to give the type of a parame-
ter. A name for the formal parameter is optional.
3.3. Types
Types are either predefined in the target language like int and INTEGER, or
user-defined like MyType, or they are tree types like Expr. A tree type is
described by the name of a tree definition, a single node type, or a list of
node types enclosed in brackets '[' and ']'. In case of ambiguities the latter
two kinds may be qualified by preceding the name of the tree definition.
Type = TreeType | UserType
TreeType = Ident | [ Ident . ] Ident | [ Ident . ] '[' Idents ']'
UserType = Ident
Idents = Ident { , Ident }
3.4. Rules
A rule behaves like a branch in a case or switch statement. It consists of a
list of patterns, a list of expressions, a return expression in case of a
function, and a list of statements.
Rule = [ Patterns ] [ => Exprs ] [ RETURN Expr ] :- { Statement ; } .
Patterns = Pattern { , Pattern }
Exprs = Expr { , Expr }
The semantics of a rule is as follows: A rule may succeed or fail. It
succeeds if all its patterns, statements, and expressions succeed - otherwise
it fails. The patterns, statements, and expressions are checked for success in
the following order: First, the patterns are checked from left to right. A
pattern succeeds if it matches its corresponding input parameter as described
below. Second, the statements are executed in sequence as long as they
succeed. The success of statements is defined below. Third, the expressions
are evaluated from left to right and their results are passed to the
corresponding output parameters. In case of a function, additionally the
expression after RETURN is evaluated and its result is returned as value of
the function call. The success of expressions is defined below, too. If all
elements of a rule succeed then the rule succeeds and the subroutine returns.
If one element of a rule fails the process described above stops and causes
the rule to fail. Then the next rule is tried. This search process continues
until either a successful rule is found or the end of the list is reached. In
9
the latter case the behaviour depends on the kind of the subroutine: A pro-
cedure does nothing, a predicate returns false, and a function signals a run-
time error. There is one exception to this definition of the semantics which
is explained later.
3.5. Patterns
A pattern describes the shape at the top or root of a subtree. A pattern can
be a decomposition of a tree, the keyword NIL, a label or a variable, one of
the don't care symbols '_' or '..', or an expression. A decomposition is writ-
ten as a node type followed by a list of patterns in parenthesis '(' and ')'.
It may be optionally preceded by a label.
Pattern = [ Ident : ] Ident ( [ Patterns ] ) | NIL | Ident | _ | .. | Expr
The match between a pattern and a value is defined recursively depending on
the kind of the pattern:
A decomposition with a node type t matches a tree u with a root node of
type s if s is a subtype of t (s t) and all subpatterns of t match their
corresponding subtrees or attributes of u. If the node type is preceded by a
label l then a binding is established between l and u which defines the label
l to refer to the tree u.
The first occurrence of a label l in a rule matches an arbitrary subtree
or attribute value u. All further occurrences of the label l within patterns
of this rule match a subtree or an attribute value v only if u is equal to v.
The equality for trees is defined in the sense of structural equivalence. Two
attributes are equal if they have the same values. A binding is established
between l and u which defines the label l to refer to the value u. The label
can be used later to access the associated value.
The pattern NIL matches the values NULL or NIL. The don't care symbol
'_' matches one arbitrary subtree or attribute value. The don't care symbol
'..' matches any number of arbitrary subtrees or attribute values. An expres-
sion matches a parameter or an attribute if both have the same values.
3.6. Expressions
Expressions denote the computation of values or the construction of trees.
Binary and unary operations as well as calls of external functions are written
as in the target language. Calls of puma functions and predicates distinguish
between input and output arguments. The syntax for tree composition is simi-
lar to the syntax of patterns.
Expr = Ident ( [ Exprs ] ) | NIL | Ident | _ | ..
| Ident ( [ Exprs ] [ => Patterns ] )
The semantics of the different kinds of expressions is as follows:
10
A node type creates a tree node and provides the children and attributes
of this node with the values given in parenthesis. NIL represents the value
NULL or NIL. A label refers to the expression it was bound to upon its defin-
ition.
A function or predicate call must be compatible with the corresponding
definition in terms of the numbers of expressions and patterns as well as
their types. A function call evaluates the expressions corresponding to input
parameters, passes the results to the function, and executes the function.
Upon return from the function the result value of the function determines the
result of this expression. The values of the output parameters that the func-
tion returns are matched against the actual patterns of the function call. If
one pair does not match the call fails. Labels in the patterns may establish
bindings that enable to refer to the output parameters or subtrees thereof.
The don't care symbols specify that no computation should be executed,
either for one or for several expressions. The result values are undefined.
The binary and unary operators (prefix and postfix) of the target language as
well as array indexing, parentheses, numbers, and strings are known to puma.
They are passed unchanged to the output.
In case of node types, labels for tree values, and functions returning
tree values, puma does type checking. For user types, target code expressions,
or target operators no type checking is done by puma but later by the com-
piler. An expression that does not contain calls of puma functions or predi-
cates always succeeds. An expression containing those calls succeeds if all
the calls succeed - otherwise it fails.
3.7. Statements
Statements are used to describe conditions, to perform output, to assign
values to attributes, and to control the execution of the transformer via
recursive subroutine calls. A statement is either a condition denoted by an
expression, a call of a procedure, an assignment, one of the keywords REJECT
or FAIL, or a target code statement. Every kind of statement may succeed or
fail as described below.
Statement = Expr | Ident ( [ Exprs ] [ => Patterns ] ) | Ident := Expr
| REJECT | FAIL | [ Declarations ] TargetCode
Declarations = Parameters
Conditions are denoted by expressions and can be used to determine pro-
perties that can not be expressed with pattern matching alone. Patterns
describe either shapes of a fixed size of a tree or the equality between two
values. Properties of trees of unlimited size and relations like '<', '<='
etc. have to be checked with conditions. The expression has to be of type
boolean or the call of a predicate. A condition succeeds if the expression
evaluates to true - otherwise it fails.
For a procedure call the same rules as for a function call apply. It
succeeds if the values of all output parameters are matched by the correspond-
ing patterns - otherwise it fails.
11
An assignment statement evaluates its expression and stores this value at
the entity denoted by the identifier on the left-hand side. The identifier can
denote a global or a local variable, an input or an output parameter, as well
as a label for an attribute or a subtree. An assignment statement succeeds if
the expression succeeds - otherwise it fails.
The statement REJECT does nothing but fail. This way the execution of the
current rule terminates and control is passed to the next rule. The statement
FAIL causes the execution of the current subroutine to terminate. This state-
ment is allowed in procedures and predicates, only. Depending on the kind of
subroutine the following happens: A procedure terminates and a predicate
returns false.
A target code statement which is enclosed in curly brackets '{' and '}'
is executed as in the target language. It can be used to define labels by
means of implementation language code or calls of external subroutines. In
this case the names of the labels and their types have to be defined expli-
citly. This is done by declarations written in the syntax of a parameter list
that precede the target code statement. A target code statement always
succeeds.
Note, statements and expressions may cause side effects by changing e. g.
global variables, local variables, the input tree, or by producing output.
Those side effects are not undone when a rule fails.
Further features such as various possibilities concerning the use of
hand-written target code, named patterns instead of positional ones, or the
definition of the equality or matching operation between attributes by a macro
mechanism are omitted for the sake of brevity.
4. Implementation
From a given specification, puma generates a program module in one of the tar-
get languages C or Modula-2 implementing the desired transformation. The sub-
routines in the sense of puma are mapped to subroutines in the target
language. Procedures yield procedures, functions yield functions that return a
value, and predicates yield boolean functions. These subroutines can be called
from other modules using the usual subroutine call syntax of the target
language provided they are exported: All arguments are separated by commas -
the symbol '=>' as separator between input and output arguments is only
required in calls processed by puma.
The types of the parameters are treated as follows: Predefined types or
user defined types remain unchanged. Node types or sets of node types are
replaced by the name of the corresponding tree type. This is a pointer to a
union of record types. Input parameters are passed by value and output parame-
ters are passed by reference.
The rules of a subroutine are treated like a comfortable case or switch
statement. The code generated for pattern matching is relatively simple. A
naive implementation would just use a sequence of if statements. This kind of
code showed to be already rather efficient. puma optimizes the code with
respect to the elimination of common tests for patterns and the clever use of
12
switch statements. Furthermore, tail recursion can be turned into iteration.
Labels are replaced by access paths to the associated values. The code for
the construction of tree nodes is inserted in-line. It is therefore efficient
because no procedure calls are necessary for the creation of tree nodes.
5. Related Research
In this section we compare our approach with several programming paradigms and
similar tools. The intention is to provide further insight in the nature of
our tool by looking at styles or work the reader might be familiar with.
5.1. Imperative Programming
Our approach can be regarded as an extension of imperative languages by a type
constructor for trees. Operations on trees for the composition and decomposi-
tion are provided using a concise term notation. The storage management for
trees is completely automatic. Intermediate variables for output parameters of
subroutines are declared implicitly and checked for the single assignment pro-
perty. Pattern matching represents a comfortable kind of case or switch state-
ment tailored towards processing of trees. This imperative view is reflected
best by the implementation of the generator puma. It can be seen as a prepro-
cessor that provides attributed trees and pattern matching for imperative
languages.
5.2. Logic Programming
From the stand point of logic programming languages such as Prolog [ClM84] our
approach is relatively restricted and therefore one cannot classify puma as a
logic programming language. Puma has no backtracking, no logic variables, and
nothing like assert and retract. The unification is unidirectional and res-
tricted to one of the two terms being a ground term. Similar are the syntax
and the presence of predicates. Puma retains the "imperative subset" of Pro-
log. It turns out that this subset can be implemented efficiently and it suf-
fices to specify most of the transformation problems in compilers [Paa89].
Compared to Prolog the following features have been added: Besides predi-
cates there are procedures, functions, and customary expressions. This allows
for recursive functions and easy calls of puma subroutines from hand-written
code. Attributes can be stored in the tree and accessed in read and write
mode. The terms or trees are statically typed. For parameters the modes input
and output are distinguished. Pattern matching is extended to cope with sub-
types and attributes. The modification of input terms is possible. Easy
escape to hand-written code and cooperation with other modules, either hand-
written or generated, gives great flexibility and "imperative power". The
transformer modules including the pattern matching are translated to direct
code and therefore efficient.
5.3. Functional Programming
Compared to modern functional programming languages such as Hope [BMS80], ML
[Mil85], or Miranda [Tur85] our approach can of course not claim to be func-
tional. There is nothing like under or over supply of functions or higher-
order functions. It just supports the restricted kind of functional
13
programming that can be found in imperative languages having functions and
recursion like e. g. C or Pascal. Even the function results are restricted to
simple data types and pointers. However, the latter allow functions to return
trees and graphs. With respect to the traditional functional language Lisp
[MAE65] our approach might be worth to be compared. Whereas Lisp supports
binary trees only, puma provides tree nodes with an arbitrary number of sub-
trees and attributes. Dynamic typing and binding have been replaced by their
static counterparts. The pattern matching facility has been added.
5.4. Attribute Grammars
Our technique is related to attribute grammars [Knu68, Knu71] in several ways.
Compared to pure attribute grammar processors such as [Gro89, JoP90, KHZ82]
there are some restrictions. In pure attribute grammar systems the attribute
computations are written in a functional notation. Knowing the dependencies
among the attributes allows the automatic (implicit) determination of an
evaluation order for the attributes (visit sequences). Furthermore it is pos-
sible to check an attribute grammar for completeness and non-circularity. Our
approach is based on an explicit evaluation order using hand-written visit
sequences. Three kinds of data can be regarded as attributes: The "real"
attributes stored in the tree, the parameters of the subroutines, and global
variables. This classification of the attributes is done by the user. Whereas
pure attribute grammars allow only computations local to a grammar rule, the
pattern matching facility makes computations on larger tree regions of fixed
size feasible. Considering attributes of the kind parameter and non-nested
patterns only, our approach is similar to affix grammars [Kos71, Kos77] or
extended affix grammars [Wat74]. This holds from the syntactic as well as
from the semantic point of view. A puma generated module can contribute to
attribute evaluation in several ways: First, the evaluation can be carried out
completely by a transformation module itself using one or more subroutines for
traversing the tree. Second, puma subroutines can be called as external sub-
routines from an attribute evaluator to compute attributes - especially those
depending on tree-valued arguments. The imperative or sequential style of
puma allows simple control on side effects and easy production of arbitrary
output.
5.5. Optran
The transformation tool Optran [LMW89] has been designed for optimizing
transformations. It is based on an attribute grammar and modifies an input
tree according to a given set of rules. A rule consists of a pattern, condi-
tions, and action statements. The actions can replace the tree part matched by
the pattern (where the unmatched subtrees might be reused) and compute new
attribute values. Optran emphasizes the automatic reevaluation of attributes
[LMO88] in order to arrive at attribute values consistent with the specified
attribute grammar. As the goals of Optran are rather ambitious they pose
several problems: In which order will the rules be applied and how to find the
locations in the tree where a pattern matches? When will the reevaluation of
attributes be carried out? The latter question is interesting because in the
worst case it will be necessary to recompute many attributes in a large part
of the tree and therefore consume a considerable amount of run time.
14
Our technique groups rules into an arbitrary number of subroutines. Every
routine may perform pattern matching on an arbitrary number of trees supplied
as arguments not just one. We explicitly control the execution order thus
gaining an efficient and detailed way to describe where and when what should
happen. Puma is not concerned with reevaluation of attributes. Besides modif-
ication of the input tree it also allows for mappings that produce arbitrary
output.
5.6. Trafola
The functional language Trafola-H [HeS91] was designed to be a specification
language for program transformations. Besides all the features one would
expect from a modern functional language it has powerful constructs for pat-
tern matching. For example pattern matching is extended to cope with tuples
and sequences. This introduces nondeterminism which is handled using back-
tracking. The language is statically typed and offers type polymorphism. Pat-
tern matching is implemented by a table-driven bottom up tree automata or by a
backtracking algorithm. The functional constructs are usually implemented by
interpretation of an internal abstract machine. Our approach does not aim to
be functional but it retains the flexibility and efficiency of imperative pro-
gramming. Whereas the Trafola type constructor for sequences allows the gram-
mar for legal trees to be written in extended BNF, puma supports pure BNF,
only. Both approaches allow so-called non linear patterns where a variable
occurs more than once in a pattern. Puma has extended pattern matching to
handle subtypes.
5.7. Codegenerator-Generators
Tools for the generation of code generators such as Twig [AGT89] or Beg
[ESL89] often use tree pattern matching, too. In addition to a pattern, a con-
dition, and an action a rule is associated with costs. Pattern matching usu-
ally performs a global match on the complete input tree in order to find a
complete cover of the tree where the sum of all participating rules is of
minimal cost. Algorithms based on dynamic programming have proved to yield
satisfactory results. These tools are oriented towards code selection by
transforming a tree-like intermediate representation into machine code. There-
fore there is de facto only one subroutine with one tree-valued argument. Beg
provides support for register allocation, has a fixed traversal scheme (post
order), and generates directly coded code-generators. Twig allows for the
modification of the input tree, supports arbitrary traversal strategies, and
generates table-driven code-generators.
5.8. Txl
The tree transformation tool txl [CaC91, CoP90] processes concrete syntax. It
comprises a parser and an unparser for input and output of trees and allows
the generation of source-to-source translators. The tree transformation is
described by a set of rules consisting of a pattern, a condition, and a
replacement. Usually the user does not have to take care about the order or
location of rule applications. Rules are applied as long as there is one that
matches. Thereby the tree is modified. There are means to describe which set
of rules should be applied to which subtrees. Patterns are not written in a
nested fashion but as strings containing nonterminals. The tool parses these
15
strings in order to recognize the nesting structure. As puma is oriented
towards abstract trees, the transformers are independent of parsing or unpars-
ing. Besides tree modifications puma allows for tree mappings producing arbi-
trary output. Another difference is the explicit control of the execution
order which allows an efficient implementation of the generated transformers.
6. Experiences
In a first real world application, puma was successfully used to generate a
code-generator in a compiler for the robot control language IRL [IRL92]. The
front-end of this compiler constructs an abstract syntax tree which is
decorated with attributes during the semantic analysis phase. The code-
generator maps this attributed tree to the standardized robot control code
IRDATA [IRD91] which is comparable to assembly code. The strongly typed nature
of IRDATA makes code-generation harder than one would expect. For instance it
is impossible to implement record variables other than by turning every field
into a separate variable. More complex types such as arrays with elements of a
record type have to be transformed into records containing arrays. Neverthe-
less these nontrivial problems could be solved relatively easy with appropri-
ate transformation rules. The code-generator uses a two phase scheme in order
to deal with forward references. The first phase selects IRDATA instructions
and stores them in memory. The second phase replaces symbolic labels by abso-
lute ones, encodes the instructions, and writes the final code to an output
file. The data structure for storing the instructions in memory is managed by
a module generated by the tool ast. The development time was three months.
Table 1: Sizes of the Code-Generator Parts
____________________________________________________________________
Specification C Code Binary Code
[lines] [lines] [KB]
____________________________________________________________________
code selection (puma) 3032 8600 111
instruction storage (ast) 165 3430 36
output (hand-written) - 700 7
____________________________________________________________________
total 3197 12730 154
____________________________________________________________________
The parts of the code-generator have the sizes listed in Table 1. The
figures stem from measurements on a SPARC station ELC. The complete compiler
runs at speed of 1000 lines per second. The run time is distributed among the
phases as follows: Scanning and parsing takes 21 %, semantic analysis 10 %,
code-generation 18 %, and output of the generated code 51 %. The huge share
for the output comes from the voluminous nature of IRDATA. The output reaches
four times the size of the source program.
7. Conclusions
We presented a tool for the transformation of attributed trees using pattern
matching. It is based on the definition, composition, and decomposition of
trees in combination with recursion and pattern matching for trees as well as
16
for attributes. It supports the mapping of trees to arbitrary output as well
as the modification of trees. Moreover it can be used to construct attribute
evaluators. The easy escape to an imperative programming language assures
flexibility and practical usability. The tool has been designed to provide an
expressive technique that can be implemented efficiently.
The tool puma and its predecessor called Gentle [WGS89] have been used
successfully in several real world projects. At our institution a front-end
for a C compiler and a compiler for a functional language have been generated.
An industrial company makes use of it for a language implementation project,
too. All applications report their satisfaction with this approach and rela-
tive short development times. The interesting question is where does this suc-
cess come from? The final paragraphs give the author's subjective opinion.
First, there are several aspects that support a high level programming
style. The data type tree including operations for composition and decomposi-
tion replaces the handling of pointers, records, and dynamic storage alloca-
tion. Pattern matching offers a comfortable kind of case statement. Once it
is understood and accepted then recursion seems to be easier to deal with then
iteration. The implicit declaration of variables simplifies coding. The
check for the single assignment property catches errors such as missing or
multiple computations. A concise syntactic notation is probably more profit-
able than one would imagine. (Many of the the mentioned arguments have con-
tributed to the success of Lisp and Prolog.) The approach supports an incre-
mental development style: Initially, rules for the most general form are sup-
plied and later rules for special cases are added in order to improve perfor-
mance. Furthermore static typing discovers many errors during generation
time.
Second, the approach is designed to be very flexible and open. Every
combination of attribute processing from mere reading and matching of attri-
butes to complete evaluation can be expressed. The method allows tree modifi-
cations as well as mappings that produce arbitrary output. The explicit
description of execution order gives full control on side effects. The escape
to imperative programming and the easy integration with external subroutines
are essential loopholes. The tool as well as the generated code are efficient
and can be used in production projects. It is probably the combination of all
the mentioned properties and reasons that we regard this approach to be very
promising.
References
[AGT89]
A. V. Aho, M. Ganapathi and S. W. K. Tjiang, Code Generation Using Tree
Matching and Dynamic Programming, ACM Trans. Prog. Lang. and Systems 11,
4 (Oct. 1989), 491-516.
[BMS80]
R. Burstall, D. MacQueen and D. Sannella, HOPE: An Experimental
Applicative Language, Report CSR-62-80, Computer Science Department,
Edinburgh, 1980.
17
[CaC91]
I. H. Carmichael and J. R. Cordy, TXL - Tree Transformation Language,
Syntax and Informal Semantics, Dept. of Computing and Information
Sciences, Queens's Univeristy, Kingston, Apr. 1991.
[ClM84]
W. F. Clocksin and C. S. Mellish, Programming in Prolog, Springer Verlag,
Berlin, 1984.
[CoP90]
J. R. Cordy and E. Promislow, Specification and Automatic Prototype
Implementation of Polymorphic Objects in TURING using the TXL Dialect
Processor, Proc. IEEE 1990 International Conference on Computer
Languages, New Orleans, Mar. 1990, 145-154.
[ESL89]
H. Emmelmann, F. W. Schroer and R. Landwehr, BEG - a Generator for
Efficient Back Ends, SIGPLAN Notices 24, 7 (July 1989), 227-237.
[Gro89]
J. Grosch, Ag - An Attribute Evaluator Generator, Compiler Generation
Report No. 16, GMD Forschungsstelle an der Universitat Karlsruhe, Aug.
1989.
[GrE90]
J. Grosch and H. Emmelmann, A Tool Box for Compiler Construction, LNCS
477, (Oct. 1990), 106-116, Springer Verlag.
[Gro91a]
J. Grosch, Puma - A Generator for the Transformation of Attributed Trees,
Compiler Generation Report No. 26, GMD Forschungsstelle an der
Universitat Karlsruhe, July 1991.
[Gro91b]
J. Grosch, Tool Support for Data Structures, Structured Programming 12,
(1991), 31-38.
[HeS91]
R. Heckmann and G. Sander, Trafola-H Reference Manual, Prospectra Project
Report, Universitat des Saarlandes, Saarbrucken, 1991.
[IRD91]
IRDATA, Industrial Robot Data, DIN 66313, Beuth-Verlag, Berlin, 1991.
[IRL92]
IRL, Industrial Robot Language, DIN 66312, Beuth-Verlag, Berlin, to
appear, 1992.
[JoP90]
M. Jourdan and D. Parigot, Application Development with the FNC-2
Attribute Grammar System, LNCS 477, (Oct. 1990), 11-25, Springer Verlag.
18
[KHZ82]
U. Kastens, B. Hutt and E. Zimmermann, GAG: A Practical Compiler
Generator, Springer Verlag, Heidelberg, 1982.
[Knu68]
D. E. Knuth, Semantics of Context-Free Languages, Mathematical Systems
Theory 2, 2 (June 1968), 127-146.
[Knu71]
D. E. Knuth, Semantics of Context-free Languages: Correction,
Mathematical Systems Theory 5, (Mar. 1971), 95-96.
[Kos71]
C. H. A. Koster, Affix Grammars, in ALGOL 68 Implementation, J. E. L.
Peck (ed.), North Holland, Amsterdam, 1971, 95-109.
[Kos77]
C. H. A. Koster, CDL: A Compiler Implementation Language, LNCS 47,
(1977), 341-351, Springer Verlag.
[LMO88]
P. Lipps, U. Moncke, M. Olk and R. Wilhelm, Attribute (Re)evaluation in
OPTRAN, Acta Inf. 26, (1988), 213-239.
[LMW89]
P. Lipps, U. Moncke and R. Wilhelm, OPTRAN - A Language/System for the
Specification of Program Transformations, System Overview and
Experiences, LNCS 371, (1989), 52-65, Springer Verlag.
[MAE65]
J. McCarthy, P. W. Abrahams, D. J. Edwards, T. R. Hart and M. I. Levin,
Lisp 1.5 Programmer's Manual, MIT Press, Cambridge, MA, 1965.
[Mil85]
R. Milner, The Standard ML Core Language, Polymorphism 2, 2 (1985), .
[NAJ76]
K. V. Nori, U. Ammann, K. Jensen, H. H. Nageli and C. Jacobi, The
Pascal-P Compiler: Implementation Notes, Bericht 10, Eidgenossische
Technische Hochschule, Zurich, July 1976.
[Paa89]
J. Paakki, A Prolog-Based Compiler Writing Tool, in Proceedings of the
Workshop on Compiler Compiler and High Speed Compilation, D. Hammer
(ed.), Berlin, GDR, 1989, 107-117.
[Tur85]
D. A. Turner, Miranda: A Nonstrict Functional Language with Polymorphic
Types, LNCS 201, (1985), 1-16, Springer Verlag.
[Vol91]
J. Vollmer, The Compiler Construction System GENTLE, GMD-Arbeitspapier
Nr. 508, GMD Forschungsstelle an der Universitat Karlsruhe, Feb. 1991.
19
[WGS89]
W. M. Waite, J. Grosch and F. W. Schroer, Three Compiler Specifications,
GMD-Studie Nr. 166, GMD Forschungsstelle an der Universitat Karlsruhe,
Aug. 1989.
[Wat74]
D. A. Watt, Analysis Oriented Two Level Grammars, Ph. D. thesis,
University of Glasgow, Glasgow, 1974.
1
Contents
Abstract ........................................................ 1
Keywords ........................................................ 1
1. Introduction .................................................... 1
2. Tree Transformation by Pattern Matching ......................... 3
3. Specification Language .......................................... 6
3.1. Tree Structure .................................................. 6
3.2. Subroutines ..................................................... 7
3.3. Types ........................................................... 8
3.4. Rules ........................................................... 8
3.5. Patterns ........................................................ 9
3.6. Expressions ..................................................... 9
3.7. Statements ...................................................... 10
4. Implementation .................................................. 11
5. Related Research ................................................ 12
5.1. Imperative Programming .......................................... 12
5.2. Logic Programming ............................................... 12
5.3. Functional Programming .......................................... 12
5.4. Attribute Grammars .............................................. 13
5.5. Optran .......................................................... 13
5.6. Trafola ......................................................... 14
5.7. Codegenerator-Generators ........................................ 14
5.8. Txl ............................................................. 14
6. Experiences ..................................................... 15
7. Conclusions ..................................................... 15
References ...................................................... 16